A Short Introduction to Kolmogorov Complexity

نویسنده

  • Volker Nannen
چکیده

This is a short introduction to Kolmogorov complexity and information theory. The interested reader is referred to the literature, especially the textbooks [CT91] and [LV97] which cover the fields of information theory and Kolmogorov complexity in depth and with all the necessary rigor. They are well to read and require only a minimum of prior knowledge. Kolmogorov complexity. Also known as algorithmic complexity and Turing complexity. Though Kolmogorov was not the first one to formulate the idea, he played the dominant role in the consolidation of the theory. The concept itself was developed independently and with different motivation by Andrei N. Kolmogorov [Kol65], Ray Solomonoff [Sol64] and Gregory Chaitin [Cha66], [Cha69]. The Kolmogorov complexity C(s) of any binary string s ∈ {0, 1} is the length of C(·) the shortest computer program s∗ that can produce this string on the Universal Turing Machine UTM and then halt. In other words, on the UTM C(s) bits of UTM information are needed to encode s. The UTM is not a real computer but an imaginary reference machine. We don’t need the specific details of the UTM. As every Turing machine can be implemented on every other one, the minimum length of a program on one machine will only add a constant to the minimum length of the program on every other machine. This constant is the length of the implementation of the first machine on the other machine and is independent of the string in question. This was first observed in 1964 by Ray Solomonoff. Experience has shown that every attempt to construct a theoretical model of computation that is more powerful than the Turing machine has come up with something that is at the most just as strong as the Turing machine. This has been codified in 1936 by Alonzo Church as Church’s Thesis: the class of algorithmically computable numerical functions coincides with the class of partial recursive functions. Everything we can compute we can compute by a Turing machine and what we cannot compute by a Turing machine we cannot compute at all. This said, we can use Kolmogorov complexity as a universal measure that will assign the same value to any sequence of bits regardless of the model of computation, within the bounds of an additive constant. ∗From The Paradox of Overfitting, [Nan03]. ar X iv :1 00 5. 24 00 v2 [ cs .C C ] 1 4 M ay 2 01 0 A Short Introduction to Kolmogorov Complexity Incomputability of Kolmogorov complexity. Kolmogorov complexity is not computable. It is nevertheless essential for proving existence and bounds for weaker notions of complexity. The fact that Kolmogorov complexity cannot be computed stems from the fact that we cannot compute the output of every program. More fundamentally, no algorithm is possible that can predict of every program if it will ever halt, as has been shown by Alan Turing in his famous work on the halting problem [Tur36]. No computer program is possible that, when given any other computer program as input, will always output true if that program will eventually halt and false if it will not. Even if we have a short program that outputs our string and that seems to be a good candidate for being the shortest such program, there is always a number of shorter programs of which we do not know if they will ever halt and with what output. Plain versus prefix complexity. Turing’s original model of computation included special delimiters that marked the end of an input string. This has resulted in two brands of Kolmogorov complexity: plain Kolmogorov complexity: the length C(s) of the shortest binary C(·) string that is delimited by special marks and that can compute x on the UTM and then halt. prefix Kolmogorov complexity: the length K(s) of the shortest binary K(·) string that is self-delimiting [LV97] and that can compute x on the UTM and then halt. The difference between the two is logarithmic in C(s): the number of extra bits that are needed to delimit the input string. While plain Kolmogorov complexity integrates neatly with the Turing model of computation, prefix Kolmogorov complexity has a number of desirable mathematical characteristics that make it a more coherent theory. The individual advantages and disadvantages are described in [LV97]. Which one is actually used is a matter of convenience. We will mostly use the prefix complexity K(s). Individual randomness. A. N. Kolmogorov was interested in Kolmogorov complexity to define the individual randomness of an object. When s has no computable regularity it cannot be encoded by a program shorter than s. Such a string is truly random and its Kolmogorov complexity is the length of the string itself plus the commando print. And indeed, strings with a Kolmogorov complexity close to their actual length satisfy all known tests of randomness. A regular string, on the other hand, can be computed by a program much shorter than the string itself. But the overwhelming majority of all strings of any length are random and for a string picked at random chances are exponentially small that its Kolmogorov complexity will be significantly smaller than its actual length. This can easily be shown. For any given integer n there are exactly 2 binary strings of that length and 2 − 1 strings that are shorter than n: one empty string, 2 strings of length one, 2 of length two and so forth. Even if all strings shorter than n would produce a string of length n on the UTM we would still 1 Plus a logarithmic term if we use prefix complexity

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Brief Introduction to Kolmogorov Complexity

In these notes we give a brief introduction to Kolmogorov complexity. The notes are based on the two talks given at KAM Spring School, Borová Lada, 2006.

متن کامل

Kolmogorov Complexity and the Incompressibility Method

1. Introduction. What makes one object more complex than another? Kolmogorov complexity, or program-size complexity, provides one of many possible answers to this fundamental question. In this theory, whose foundations have been developed independently by R. the complexity of an object is defined as the length of its shortest effective description, which is the minimum number of symbols that mu...

متن کامل

On the Kolmogorov-Chaitin Complexity for short sequences

A drawback to Kolmogorov-Chaitin complexity (K) is that it is uncomputable in general, and that limits its range of applicability. Moreover when strings are short, the dependence of K on a particular universal Turing machine U can be arbitrary. In practice one can approximate it by computable compression methods. However, such compression methods do not provide a good approximation for short se...

متن کامل

Title Kolmogorov Complexity Estimation and Analysis

Methods for discerning and measuring Kolmogorov Complexity are discussed and their relationships explored. A computationally efficient method of using Lempel Ziv 78 Universal compression algorithm to estimate complexity is introduced. 1 Abstract—Methods for discerning and measuring Kolmogorov Complexity are discussed and their relationships explored. A computationally efficient method of using ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1005.2400  شماره 

صفحات  -

تاریخ انتشار 2003